Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 2:52am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Sure, if you look at the URL used for the ZIP on this item:

http://www.archive.org/compress/sci2000-01-27.dsbd.shnf

And for this item:

http://www.archive.org/download/sci1998-04-16.flac16/sci1998-04-16.flac16_flac.zip

The "Compress" vs. "Download" in the URL is your tip-off if it's zip on the fly or already created.

HTH

-Brad

This post was modified by Brad Leblanc on 2005-03-08 10:52:00

Reply to this post
Reply [edit]

Poster: Michael Birk Date: Mar 8, 2005 2:45am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Using HTTP, I was unable to resume an aborted download of either one of those files. More specifically, both responses ignored the HTTP "Range" header and returned a "200 OK" rather than the required "206 Partial Content".

I did notice some differences between the two, however. The first one (with the "compress" in the URL) returned a non-standard "X-Content-Minimum-Length" header rather than the (important) "Content-Length" header.

Does "resume" for these files only work with FTP? Let me know if there is anything I can do to help get HTTP-based resume working. If disk space is an issue (obviously you have tons ;-), it may be better to fix the zip-on-the-fly script rather than statically creating all of those .zip files.

Reply to this post
Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 4:06am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

it may be better to fix the zip-on-the-fly script rather than statically creating all of those .zip files.

Last time it was discussed (off-forum) we wanted to do away with it because it is a very CPU intensive solution and is not very scalable. -- Imagine 150 people downloading 150 different recordings from 1 server, with a single 1.8 GHz CPU - all while using "Zip on the Fly" - that's 150 simultaneous threads not only transferring info but trying to compress it while doing so. :)

Does "resume" for these files only work with FTP?

I don't know the answer to this, I think other fans have had success resuming zip files (that are complete and *not* on-the-fly) with HTTP here, but I never use it. IMO - FTP is a much better solution for transferring large files. It always resumes, and it easily allows you to queue up a bunch of items and walk away.

If disk space is an issue

We're moving to one of these later in 2005:

http://www.archive.org/web/petabox.php

Early estimates are between 500 and 1000 terabytes (1024TB = 1 petabyte). Possibly bigger. The current LMA collection is somewhere between 20 and 35 terabytes. Not all of that room is being assigned to LMA expansion, but you get the idea... :)

-Brad

This post was modified by Brad Leblanc on 2005-03-08 12:06:13

Reply to this post
Reply [edit]

Poster: Michael Birk Date: Mar 8, 2005 6:22am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

Well, my apologies if I am re-hashing an old conversation. However, a few points:

It should be possible to do scalable, on-the-fly zipping that supports resumption. There is no need to use ZIP file compression, since these audio files are already compressed (with MP3, OGG, Shorten, or Flac). Without the ZIP compression, it should not be CPU-intensive.

HTTP resumption pretty much works for all clients, assuming the server supports it. As we are discussing, it is a bit tricky to implement for dynamic content, but certainly not impossible.

There are some advantages to HTTP over FTP for content distribution (even large files). In particular, caching is much more straightforward, since the HTTP protocol specifically supports it.

I sent an email last night to info@archive.org offering to help with the on-the-fly-zip. Any chance you will take me up on the offer? If I just implement it as, say, a PHP script, could you use it?

thanks,
mcb

p.s. The petabox looks pretty cool! :-) However, if you store the .zip files, is it really a 500-gigabox?

Reply to this post
Reply [edit]

Poster: Brad Leblanc Date: Mar 8, 2005 10:50am
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

It should be possible to do scalable, on-the-fly zipping that supports resumption. There is no need to use ZIP file compression, since these audio files are already compressed (with MP3, OGG, Shorten, or Flac). Without the ZIP compression, it should not be CPU-intensive

Well, I guess I'm still at the point where I don't see the benefit. If space isn't an issue, what does the on-the-fly stuff gain us?

I sent an email last night to info@archive.org offering to help with the on-the-fly-zip. Any chance you will take me up on the offer? If I just implement it as, say, a PHP script, could you use it?

I responded to that around 2 or 3 this afternoon Michael. Not sure why you haven't seen it yet. Let me know if I need to resend.

If I just implement it as, say, a PHP script, could you use it?

I'm not the person that will be implementing it (I'm just a librarian and middleman for the real engineers), but I guess if you can convince me of why we should use on-the-fly then I will send it to them. If we're retiring it to free up resources (CPU), what does keeping it around help with?

We appreciate your offer to help.

The petabox looks pretty cool! :-) However, if you store the .zip files, is it really a 500-gigabox?

No, it's a 500,000 gigabox, or a 500 terabox. :) And when that fills up in 10-15 years we talk about rolling in another bigger one. We'll see...

-Brad

This post was modified by Brad Leblanc on 2005-03-08 18:50:31

Reply to this post
Reply [edit]

Poster: Diana Hamilton Date: Mar 8, 2005 11:07pm
Forum: etree Subject: Re: Why doesn't the archive allow files to resume?

No, it's a 500,000 gigabox, or a 500 terabox. :) And when that fills up in 10-15 years we talk about rolling in another bigger one. We'll see...

Gosh, remember when we had etree01 and etree02 and imagining in this forum when we'd be up to etree38 and etree39... "yeah, that will be really cool". Same feeling here and now. :)